Fuzzy paraphrases in learning word representations with a lexicon
نویسندگان
چکیده
We figure out a trap that is not carefully addressed in the previous works using lexicons or ontologies to train or improve distributed word representations: For polysemous words and utterances changing meaning in different contexts, their paraphrases or related entities in a lexicon or an ontology are unreliable and sometimes deteriorate the learning of word representations. Thus, we propose an approach to address the problem. We consider each paraphrase of a word in a lexicon not fully a paraphrase, but a fuzzy member (fuzzy paraphrase) in the paraphrase set whose membership (i.e., degree of truth) depends on the contexts. Then we propose an efficient method to use the fuzzy paraphrases to learn word embeddings. We approximately estimate the local membership of paraphrases, and train word embeddings using a lexicon jointly by replacing the words in the contexts with their paraphrases randomly subject to the membership of each paraphrase. The experimental results show that our method is efficient, overcomes the weakness of the previous related works in extracting semantic information and outperforms the previous works of learning word representations using lexicons.
منابع مشابه
Natural Language Processing With Modular PDP Networks and Distributed Lexicon yz
An approach to connectionist natural language processing is proposed, which is based on hierarchically organized modular Parallel Distributed Processing (PDP) networks and a central lexicon of distributed input/output representations. The modules communicate using these representations, which are global and publicly available in the system. The representations are developed automatically by all...
متن کاملNatural Language Processing With Modular PDP Networks and Distributed Lexicon
An approach to cannectionist natural language processing is proposed, which is based on hierarchically organized modular parallel distributed processing (PDP) networks and a central lexican of distributed input/output representations. The modules communicate using these representations, which are global and publicly available in the system. The representations are developed automatically by all...
متن کاملNatural Language Processing With Modular PDP Networks
An approach to connectionist natural language processing is proposed, which is based on hierarchically organized modular Parallel Distributed Processing (PDP) networks and a central lexicon of distributed input/output representations. The modules communicate using these representations, which are global and publicly available in the system. The representations are developed automatically by all...
متن کاملScript - Based Inference and Memory Retrieval in
DISCERN is an integrated natural language processing system built entirely from distributed neural networks. It reads short narratives about stereotypical event sequences, stores them in episodic memory, generates fully expanded paraphrases of the narratives, and answers questions about them. Processing in DISCERN is based on hierarchically-organized backpropagation modules, communicating throu...
متن کاملJoint Word Representation Learning Using a Corpus and a Semantic Lexicon
Methods for learning word representations using large text corpora have received much attention lately due to their impressive performance in numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and word analogy detection. Despite their success, these datadriven word representation learning methods do not consider the rich semantic relational structure betw...
متن کامل